Search CORE

451 research outputs found

Consistent Goal-Directed User Model for Realistic Man-Machine Task-Oriented Spoken Dialogue Simulation

Author: Pietquin Olivier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2006
Field of study

International audienceBecause of the great variability of factors to take into account, designing a spoken dialogue system is still a tailoring task. Rapid design and reusability of previous work is made very difficult. For these reasons, the application of machine learning methods to dia-logue strategy optimization has become a leading subject of re-searches this last decade. Yet, techniques such as reinforcement learning are very demanding in training data while obtaining a substantial amount of data in the particular case of spoken dia-logues is time-consuming and therefore expansive. In order to expand existing data sets, dialogue simulation techniques are be-coming a standard solution. In this paper we describe a user modeling technique for realis-tic simulation of man-machine goal-directed spoken dialogues. This model, based on a stochastic description of man-machine communication, unlike previously proposed models, is consistent along the interaction according to its history and a predefined user goal

HAL-CentraleSupelec

Crossref

HAL-Rennes 1

Learning to ground in spoken dialogue systems

Author: Pietquin Olivier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

PosterMachine learning methods such as reinforcement learning applied to dialogue strategy optimization has become a leading subject of researches since the mid 90's. Indeed, the great variability of factors to take into account makes the design of a spoken dialogue system a tailoring task and reusability of previous work is very difficult. Yet, techniques such as reinforcement learning are very demanding in training data while obtaining a substantial amount of data in the particular case of spoken dialogues is time-consuming and therefore expansive. In order to expand existing data sets, dialogue simulation techniques are becoming a standard solution. In this paper, we present a user model for realistic spoken dialogue simulation and a method for using this model so as to simulate the grounding process. This allows including grounding subdialogues as actions in the reinforcement learning process and learning adapted strateg

Optimising Spoken Dialogue Strategies within the Reinforcement Learning Paradigm

Author: Pietquin Olivier
Publication venue: 'IntechOpen'
Publication date: 01/01/2008
Field of study

Optimising Spoken Dialogue Strategies within the Reinforcement Learning Paradig

Un Cadre Probabiliste pour l'Optimisation des Systèmes de Dialogue

Author: Pietquin Olivier
Publication venue: HAL CCSD
Publication date: 01/03/2007
Field of study

Dans cet article, un cadre théorique pour la simulation et l'optimisation automatique de systèmes de dialogues vocaux entre homme et machine par le biais d'un apprentissage non-supervisé de stratégies est proposé. Ce cadre s'appuie sur une description probabiliste de la communication parlée entre homme et machine. Il permet de s'inscrire dans le cadre des processus décisionnels de Markov et de faire usage de l'apprentissage par renforcement pour rechercher une stratégie optimale de manière indépendante de la tâche. Deux applications concrètes du cadre proposé aux cas du remplissage de formulaire et de l'interrogation de bases de données sont données afin d'en démontrer les utilisations possibles

HAL-CentraleSupelec

HAL-Rennes 1

Is the Bellman residual a bad proxy?

Author: Geist Matthieu
Pietquin Olivier
Piot Bilal
Publication venue
Publication date: 04/12/2017
Field of study

This paper aims at theoretically and empirically comparing two standard optimization criteria for Reinforcement Learning: i) maximization of the mean value and ii) minimization of the Bellman residual. For that purpose, we place ourselves in the framework of policy search algorithms, that are usually designed to maximize the mean value, and derive a method that minimizes the residual

\|T_* v_\pi - v_\pi\|_{1,\nu}

over policies. A theoretical analysis shows how good this proxy is to policy optimization, and notably that it is better than its value-based counterpart. We also propose experiments on randomly generated generic Markov decision processes, specifically designed for studying the influence of the involved concentrability coefficient. They show that the Bellman residual is generally a bad proxy to policy optimization and that directly maximizing the mean value is much better, despite the current lack of deep theoretical analysis. This might seem obvious, as directly addressing the problem of interest is usually better, but given the prevalence of (projected) Bellman residual minimization in value-based reinforcement learning, we believe that this question is worth to be considered.Comment: Final NIPS 2017 version (title, among other things, changed

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-INSU

HAL Descartes

Hal-Diderot

A Theory of Regularized Markov Decision Processes

Author: Geist Matthieu
Pietquin Olivier
Scherrer Bruno
Publication venue
Publication date: 04/06/2019
Field of study

Many recent successful (deep) reinforcement learning algorithms make use of regularization, generally based on entropy or Kullback-Leibler divergence. We propose a general theory of regularized Markov Decision Processes that generalizes these approaches in two directions: we consider a larger class of regularizers, and we consider the general modified policy iteration approach, encompassing both policy iteration and value iteration. The core building blocks of this theory are a notion of regularized Bellman operator and the Legendre-Fenchel transform, a classical tool of convex optimization. This approach allows for error propagation analyses of general algorithmic schemes of which (possibly variants of) classical algorithms such as Trust Region Policy Optimization, Soft Q-learning, Stochastic Actor Critic or Dynamic Policy Programming are special cases. This also draws connections to proximal convex optimization, especially to Mirror Descent.Comment: ICML 201

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Difference of Convex Functions Programming Applied to Control with Expert Data

Author: Geist Matthieu
Pietquin Olivier
Piot Bilal
Publication venue
Publication date: 05/09/2016
Field of study

This paper reports applications of Difference of Convex functions (DC) programming to Learning from Demonstrations (LfD) and Reinforcement Learning (RL) with expert data. This is made possible because the norm of the Optimal Bellman Residual (OBR), which is at the heart of many RL and LfD algorithms, is DC. Improvement in performance is demonstrated on two specific algorithms, namely Reward-regularized Classification for Apprenticeship Learning (RCAL) and Reinforcement Learning with Expert Demonstrations (RLED), through experiments on generic Markov Decision Processes (MDP), called Garnets

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1

Machine Learning Methods for Spoken Dialogue Simulation and Optimization

Author: Olivier Pietquin
Publication venue: 'IntechOpen'
Publication date: 01/01/2009
Field of study

Computers and electronic devices are becoming more and more present in our day-to-day life. This can of course be partly explained by their ability to ease the achievement of complex and boring tasks, the important decrease of prices or the new entertainment styles they offer. Yet, this real incursion in everybody's life would not have been possible without an important improvement of Human-Computer Interfaces (HCI). This is why HCI are now widely studied and become a major trend of research among the scientific community. Designing “user-friendly” interfaces usually requires multidisciplinary skills in fields such as computer science, ergonomics, psychology, signal processing etc. In this chapter, we argue that machine learning methods can help in designing efficient speech-based humancomputer interfaces

IntechOpen

HAL-CentraleSupelec

HAL-Rennes 1

LIG-CRIStAL System for the WMT17 Automatic Post-Editing Task

Author: Berard Alexandre
Besacier Laurent
Pietquin Olivier
Publication venue
Publication date: 17/07/2017
Field of study

This paper presents the LIG-CRIStAL submission to the shared Automatic Post- Editing task of WMT 2017. We propose two neural post-editing models: a monosource model with a task-specific attention mechanism, which performs particularly well in a low-resource scenario; and a chained architecture which makes use of the source sentence to provide extra context. This latter architecture manages to slightly improve our results when more training data is available. We present and discuss our results on two datasets (en-de and de-en) that are made available for the task.Comment: keywords: neural post-edition, attention model

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot